fix: align Java interface dispatch across wasm/native/hybrid#1503
Conversation
… diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433
On incremental builds, runPostNativeCha previously scanned all call→qualified-method edges in the DB (~12ms flat, O(graph size)), even for 1-file changes where no hierarchy or RTA evidence changed. Add two cheap indexed gate queries. Gate A checks whether any changed file introduced a class/interface/trait/struct/record node (hierarchy may have new implementors reachable from unchanged call sites). Gate B checks whether any changed file added a call edge to a class-kind target (RTA set may have grown, enabling previously filtered expansions in unchanged callers). If neither gate fires, restrict the candidate query to src.file IN changedFiles — safe because the hierarchy and instantiated set are unchanged for all other files. Full builds (isFullBuild=true) and cases where either gate fires retain the existing full-scan behaviour. Mirrors the changed-files scoping pattern of runPostNativeThisDispatch. Closes #1441
Times each JS post-pass in tryNativeOrchestrator and exposes the
measurements in BuildResult.phases:
- gapDetectMs — dropped-language gap detection + backfill
- chaMs — CHA expansion (interface dispatch)
- thisDispatchMs — this/super dispatch WASM re-parse (was already
tracked but now properly named alongside the rest)
- reclassifyMs — scoped role re-classification after edge insertion
- techniqueBackfillMs — technique-column UPDATE on native-written edges
Previously only thisDispatchMs was reported, causing wall-clock vs
phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark
regressions undiagnosable from committed history.
Updates update-incremental-report.ts to render the new phases in a
collapsible details block under each engine's 1-file rebuild section.
Closes #1434
…ld for required-tier grammars The docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is crash safety for exotic WASM grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline. Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops are almost always well under 32 files and also benefit from the inline path without meaningful crash risk increase. Closes #1435
…e incremental rebuilds On 1-file native incremental builds, two JS post-passes ran unconditionally even when they had no work to do: - `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when detectDroppedLanguageGap returned an empty gap. Gate now checks gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching backfillNativeDroppedFiles's own internal early-exit guard. - Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even when none of them wrote any edges. COUNT(*) over 50K+ edge tables is non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated on postPassWroteData (backfill | CHA edges | this-dispatch edges). Closes #1454
The post-pass it timed (runPostNativePrototypeMethods) was deleted in b5c03a2 when func-prop extraction moved to Rust (#1432). The optional field was never set by any code path that survived the deletion. Also remove the stale reference to "prototype-methods post-pass" from the parseFilesWasmForBackfill docstring — only the this-dispatch post-pass uses symbolsOnly now. Closes #1432
… collision Field type annotations (`private repo: OrderRepository`) were seeded as bare file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to `OrderRepository` when both classes had a `repo` field (issue #1458). Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch) now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X` resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as fallbacks for single-class files or class expressions where no enclosing class name is available. Both engines change identically — parity preserved.
…ed names
The resolution benchmark uses WASM-built graphs where the Elixir, Julia,
and Objective-C extractors emit module-qualified symbol names (Main.run,
App.main, UserService.create_user, etc.). The expected-edges manifests
were written with bare unqualified names (run, main, create_user), so
every correctly-resolved edge appeared as a false positive and every
expected edge appeared as a false negative — causing all three languages
to show 0% precision even though resolution was working correctly.
Root cause: starting in v3.12.0, cross-module call resolution began working
for these languages (via the improved receiver-dispatch and same-class
fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges
previously resolved, the name mismatch was invisible; once edges started
resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP
(objc) — all correctly resolved edges misidentified as false positives.
Fix:
- Update all three expected-edges.json manifests to use the
module-qualified names matching actual extractor output:
elixir: Main.run, UserService.create_user, Validators.validate_user, etc.
julia: App.main, Service.create_user, Repository.new_repo, etc.
objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
plus add main -> run (plain C call correctly resolved)
- Ratchet THRESHOLDS for all three:
elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved)
julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved)
objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved)
Remaining FNs are genuine unresolved edges (same-file bare calls in
elixir/julia, receiver-typed message sends in objc) — not regressions.
Closes #1447
The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes,
so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;').
Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and
silently skipped the receiver edge.
Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror
match_c_family_type_map ('declaration' branch) from the native Rust path: extract
the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each
identifier or init_declarator child. Primitive types (int, char, bool, …) are
skipped to avoid spurious edges.
parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid)
All 3044 tests pass.
… in Rust solver
Go extractor was only seeding typeMap for var_spec and parameter_declaration,
missing short_var_declaration. Added infer_short_var_types to handle:
- x := Struct{} → conf 1.0 (composite literal)
- x := &Struct{} → conf 1.0 (address-of composite)
- x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix)
Python extractor was only seeding typeMap for typed_parameter and
typed_default_parameter, missing plain assignment. Added
infer_py_assignment_type to handle:
- order = Order(...) → conf 1.0 (uppercase constructor)
- obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin)
Both mirror the existing JS extractors exactly. Parity check for
go and python: wasm vs native/hybrid OK.
…l, zig) Both engines used different rules for attributing calls inside variable bindings: WASM: attributed to the narrowest enclosing span regardless of kind, so local variable declarations inside fn main() shadowed the enclosing function (Zig: calls attributed to repo/svc variables instead of main), and nested let-bindings inside a Haskell do-block shadowed the top-level main binding. Native: loaded allNodes from a query that excluded 'variable' kind, so top-level Haskell bind nodes (main = do …, kind='variable') never matched in defs_with_ids, causing all calls to fall back to the file node. Unified rule implemented in findCaller (TS) and find_enclosing_caller (Rust): - Function/method definitions are preferred over any variable/constant binding as the enclosing caller scope — local var declarations inside a function body never shadow the enclosing function (fixes Zig repo/svc attribution). - When no function/method encloses the call, fall back to the WIDEST (outermost) variable/constant binding — this handles Haskell where main is a top-level bind node with kind 'variable'. Widest span is used so that nested let-bindings do not shadow the outer main binding. - File node remains the absolute last resort. Also adds 'variable' to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER (Rust pipeline.rs) so top-level variable bindings are included in the allNodes set available for caller matching. parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures.
…and test Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat primitive-type Set literals and test expect() calls to satisfy biome line-length rules.
Java was the only fixture where all three build paths (wasm, native, hybrid) disagreed pairwise. Bug 1 — WASM typeMap pollution: `handleJavaLocalVarDecl` used last-wins Map.set(), so the local `InMemoryUserRepository repo` in the static `createDefault()` method silently overrode the constructor parameter `UserRepository repo`. This caused WASM to bypass the interface and resolve directly to the concrete class, producing no interface edge and the wrong receiver. Fix: switch to first-wins `setTypeMapEntry` to match Rust extractor semantics. First-wins preserves the interface annotation that drives correct CHA dispatch. Bug 2 — native vs wasm/hybrid confidence mismatch: `runPostNativeCha` (native orchestrator path) used `computeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6`, while `runChaPostPass` (DB post-pass used by wasm and hybrid) hardcodes 0.8. Fix: align `runPostNativeCha` to also use 0.8. Result: all three build paths now emit identical edges and confidences. `parity-compare.mjs --langs java --hybrid` passes. Updated expected-edges.json to include both the interface declaration edge (TypeRepository.X at 0.7) and the CHA-expanded impl edge (InMemoryUserRepository.X at 0.8), which are the correct semantics for an interface-typed receiver. Closes #1469
|
Claude finished @carlos-alm's task —— View job PR Review: fix/java-interface-dispatch-1469
Phase 1 — Root Cause AnalysisBug 1 (WASM typeMap pollution): Confirmed. Bug 2 (confidence mismatch): Confirmed. Verdict: Both root causes are genuine, the fixes address the actual sources of divergence, and neither fix papers over symptoms. Phase 2 — Configuration Change Detection
No linting rules relaxed. No strictness changes. Phase 3 — Test Integrity
No red flags found. Phase 4 — Code Quality
Scope bloat in the diff: The PR title says "align Java interface dispatch" but the diff includes significant changes to:
These changes are not explained in the PR description. This is a significant scope concern — the PR title only claims Java interface dispatch fixes, but the actual diff is substantially larger. Elixir/Julia/ObjC expected-edges.json: These are bulk updates to expected edges (46+46+38+31 lines changed). Are these corrections of pre-existing wrong edges, or coincidental updates? No explanation is given. Phase 5 — Backlog ComplianceZero-dep: ✓ — No new Foundation-aligned: ✓ — All changes are local analysis, SQLite, tree-sitter. No cloud calls, no external services, no agents. Aligns with P1 (always-current) and P4 (zero-cost core). Problem-fit: 5 — Directly prevents hallucinated dependency edges (wrong receiver type = wrong call graph = AI navigates to wrong implementation). The interface dispatch bug was causing AI agents to see Breaking: No — purely fixes wrong edges to correct edges. No API, CLI, schema, or MCP contract changes. Tier: 1 — Zero new deps, fully foundation-aligned. Critical Concerns
Final Assessment |
Greptile SummaryThis PR fixes two root causes of parity divergence between the wasm, native, and hybrid build paths for Java (and C++/CUDA): a last-wins
Confidence Score: 5/5Safe to merge — all four changes are narrowly scoped bug fixes with no new logic surface area. The typeMap fix (first-wins via No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as WASM / Hybrid path
participant N as Native path
participant H as helpers.ts
note over H: CHA_DISPATCH_CONFIDENCE = 0.8
W->>H: runChaPostPass()
H-->>W: "edge confidence = CHA_DISPATCH_CONFIDENCE (0.8)"
N->>H: runPostNativeCha() [imports constant]
H-->>N: "edge confidence = CHA_DISPATCH_CONFIDENCE (0.8)"
note over W,N: Both paths now emit identical confidence on CHA-expanded edges
Reviews (11): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile |
Codegraph Impact Analysis4 functions changed → 10 callers affected across 7 files
|
…TCH_CONFIDENCE constant Switch handleCppDeclaration and handleCudaDeclaration from last-wins ctx.typeMap.set() to first-wins setTypeMapEntry(), fixing the same flat typeMap pollution bug this PR corrects in java.ts. Extract the CHA dispatch confidence value to a named CHA_DISPATCH_CONFIDENCE constant in helpers.ts so runChaPostPass and the native orchestrator share a single source of truth instead of two synchronized magic numbers.
|
Addressed review feedback in commit 4aced08:
|
…nds (#1503) The incremental scoping gate in runPostNativeCha only checked tgt.kind = 'class' for new RTA evidence, but the adjacent two-tier RTA collection also falls back to constructor/function kinds for databases written by older native engine versions. On such incremental builds Gate B would never fire, causing the scan to be wrongly scoped to changed files and missing CHA expansion for unchanged-file call sites.
|
Addressed the Gate B incremental scoping concern from the Greptile summary in commit 7eda366: The Gate B query in |
|
Addressed both concerns from the Claude review:
The minor comment about |
|
Claude finished @carlos-alm's task —— View job Follow-up Review: fix/java-interface-dispatch-1469
Verification ResultsAll three critical items from the previous review have been correctly implemented: 1.
2. C++/CUDA
3. Gate B kind extension ✅
Updated AssessmentThe two critical concerns from the initial review are resolved. The three changes are narrowly scoped and technically correct:
Rating: ⭐⭐⭐⭐ (4/5) The one remaining 3→4 star gap: the verbose comment blocks added to View job | Branch: fix/java-interface-dispatch-1469 |
|
Resolved merge conflicts with main in commit 733b0d5. Two files had conflicts:
|
Java was the only fixture where all three build paths (wasm, native, hybrid) disagreed pairwise on the same edges. Two distinct bugs caused this.
Bug 1 — WASM typeMap pollution (
src/extractors/java.ts)handleJavaLocalVarDeclused last-winsMap.set(), so a local variable declaration in one method (e.g.InMemoryUserRepository repoin the staticcreateDefault()) silently overrode an earlier binding set by a constructor parameter (UserRepository repo). The typeMap is flat per-file without method scoping, so this pollution propagated to all instance methods.Result: WASM resolved
repo.save()directly toInMemoryUserRepository.save(bypassing the interface), emitting no interface declaration edge and the wrong receiver node (InMemoryUserRepository(class)instead ofUserRepository(interface)).Fix: switch to
setTypeMapEntry(first-wins on tie), matching Rust extractor semantics. First-wins preserves the interface annotation that drives correct CHA dispatch.Bug 2 — native vs wasm/hybrid confidence mismatch (
native-orchestrator.ts)runPostNativeCha(native orchestrator path) computed CHA dispatch confidence ascomputeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6. The DB-levelrunChaPostPass(used by wasm and hybrid) hardcodes0.8. These two code paths disagreed on the confidence of every CHA-expanded interface dispatch edge.Fix: align
runPostNativeChato use the same value asrunChaPostPassby extractingCHA_DISPATCH_CONFIDENCE = 0.8as a named constant inhelpers.ts, imported by both callers. This removes the implicit coupling between two previously synchronized magic numbers.Accompanying fixes (discovered while achieving full parity)
C++/CUDA typeMap first-wins (
src/extractors/cpp.ts,src/extractors/cuda.ts):handleCppDeclarationandhandleCudaDeclarationhad the same last-winsMap.set()bug as Java. Both now usesetTypeMapEntry(first-wins on tie), consistent with all other WASM extractors and the Rust extractor semantics.Go factory / Python constructor receiver inference (
crates/codegraph-core/src/extractors/go.rs,python.rs):The Rust solver was not inferring receiver types for Go
:=short-variable assignments from factory calls (NewFoo()→*Foo) or Python__init__constructor calls (self.x = SomeClass()). Addedinfer_short_var_types(Go) and constructor-return inference (Python) to close parity gaps with WASM in those two languages.Enclosing-caller attribution for variable bindings (
src/domain/graph/builder/call-resolver.ts):findCallernow uses a two-pass strategy: narrowest enclosing function/method wins; widest variable/constant binding is the fallback. This fixes incorrect attribution for Haskell top-level bindings and Zig const-fn patterns where the call site is inside a variable initializer rather than a named function body.Confidence-descending call target sort (
src/domain/graph/builder/stages/build-edges.ts):Call targets are now sorted by confidence (desc) before emission to match native engine output order, fixing non-deterministic edge ordering that caused intermittent parity test failures.
Class-scoped field annotation typeMap keys (
src/extractors/javascript.ts):handleFieldDefTypeMapnow emitsClassName.fieldas the primary key (confidence 0.9) with barefield/this.fieldas fallbacks (0.6), preventing cross-class typeMap collision when two classes declare identically-named fields (issue #1458).Gate B RTA check extended to constructor/function kinds (
native-orchestrator.ts):The incremental Gate B query now checks
tgt.kind IN ('class', 'constructor', 'function')instead of onlytgt.kind = 'class'. On incremental builds against older-schema DBs where constructor calls are stored underconstructor/functionkinds, Gate B now correctly fires when RTA evidence is present in changed files.Elixir/Julia/ObjC expected-edges updated to module-qualified names:
These three fixture manifests were using unqualified function names as edge targets (e.g.
"to": "greet") but the extractor now emits module-qualified names ("to": "Greeter.greet"). Updated to match current extractor output — these are corrections of stale ground truth, not regressions.Performance and CI:
symbolsOnlyflag now propagates throughparseFilesWasmInline(was silently ignored on that path)INLINE_BACKFILL_THRESHOLDraised from 16 → 32 for required-tier grammarsrunPostNativeChascoped to changed files on incremental buildsresult.phasesperf-canary.ymlworkflow gates extractor/graph/native PRs at 50% regression threshold (wider than the full CI gate to account for runner variance)Outcome
parity-compare.mjs --langs java --hybridgoes green (all three paths: 48 nodes, 81 edges, identical)expected-edges.jsonto include both the interface declaration edge (UserRepository.Xat conf=0.7) and the CHA-expanded impl edge (InMemoryUserRepository.Xat conf=0.8) — the correct semantics for an interface-typed receiverCloses #1469